The ICT,CAS MT Systems for the IWSLT09 Evaluation
نویسندگان
چکیده
We only use the data provided by the organizer for each task. We first used the Chinese lexical analysis system ICTCLAS for splitting Chinese characters into words and a rule-based tokenizer for tokenizing English sentences. Then, we convert all alphanumeric characters to their 2byte representation. Finally, we ran GIZA++ and used the “grow-diagfinal” heuristic to get many-to-many word alignments. We used the SRI Language Modeling Toolkit to train the Chinese/English 5-gram language model with Kneser-Ney smoothing on the Chinese/English side of the training corpus respectively. Regarding to Silenus, we used the Chinese parser of Xiong et al.(2006) and English parser of Charniak et al.(2005) to parse the source and target side of the bilingual corpus into packed forests respectively. Then we pruned the forests with the marginal probability based insideoutside algorithm with a pruning threshold pe = 3. At the decoding time, we use a larger pruning threshold pd = 12 to generate the packed forest.
منابع مشابه
LIG approach for IWSLT09 : using multiple morphological segmenters for spoken language translation of Arabic
This paper describes the LIG experiments in the context of IWSLT09 evaluation (Arabic to English Statistical Machine Translation task). Arabic is a morphologically rich language, and recent experimentations in our laboratory have shown that the performance of Arabic to English SMT systems varies greatly according to the Arabic morphological segmenters applied. Based on this observation, we prop...
متن کاملUsing Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting
In this paper, we address the issue of applying example-based machine translation (EBMT) methods to overcome some of the difficulties encountered with statistical machine translation (SMT) techniques. We adopt two different EBMT approaches and present an approach to augment output quality by strategically combining both EBMT approaches with the SMT system to handle issues arising from the use o...
متن کاملCRISPR-Cas: the effective immune systems in the prokaryotes
Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separa...
متن کاملPnm-25: Nursing Information Systems: Issues and Challenges
Background: The nursing process is often considered as core of the nursing care delivery and guides the care documentation. Currently, with rapid advance in Information and Communication Technology (ICT) this process can be supported electronically. Applying information systems improves care health processes. Nursing Information Systems (NISs) deal with nursing process. Materials and Methods: E...
متن کاملThe application and mechanism of CRISPR-Cas systems in the treatment of infectious diseases
Infectious diseases remain a global threat with many people annually contracting the epidemic diseases. Improved understanding of the pathogenesis of bacteria, viruses, fungi, and parasites, along with rapid diagnosis and treatment of human infections are essential to improving infectious diseases outcomes worldwide. In many genomic loci in bacteria and archea, termed Clustered Regularly Inters...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009